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ABSTRACT 

The purpose of this study is to shed light on the capabilities for storing, analysing and sharing big data in 
developing countries. The study takes an in-depth look at adoption of big data as a technological innovation, as 
well as the adoption issues for Big Data, its availability and access. The paper presents a review of academic 
literature, policy documents from international agencies and reports from industry in order to assess the 
diffusion and adoption of big data innovation in developing countries. The study was broadened by a Google 
Scholar search for relevant literature where the combinations of the following key words were used big data 
and analytics, developing countries, and diffusion of Innovations. Diffusion of innovations can greatly 
accelerate adoption and utilization of Big Data, even though there are challenges faced by developing countries 
which limit capability and utilization of these technologies effectively. The paper presents the Innovations 
Diffusions Theoretical framework for the study of Big Data innovation adoption in developing countries. The 
study concludes that the diffusion theory concepts provide an effective mechanism for policy leaders in 
developing countries to maximize adoption of Big Data innovations, and can also be used in informing policy 
implementers on how to increase adoption rates for Big Data. 
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I. Introduction 

Innovations in technology and greater affordability of digital devices worldwide have ushered in an age 
of Big Data [1], [2] describes Big Data as an umbrella term referring to the large amounts of digital data that is 
continually generated by the global population, referring to the explosion in the quantity and diversity of high- 
frequency digital data, and the innovations which allow for its storage and analysis. Digital data is being 
produced in real time at an unprecedented rate across the developing world, just as we all go about our daily 
lives. Globally, 98 percent of all stored data today is in digital form compared to the year 2000 when only a 
quarter of the world’s stored information was digital, while the rest was preserved on paper and other analogue 
media like film [1], According to the [2], the speed and frequency by which data is produced and collected by 
an increasing number of sources is responsible for today’s data deluge whereby on a global scale the amount of 
available digital data is projected to increase at a rate of 40% annually. A substantial proportion of this output is 
derived from records generated as a by-product of everyday interactions with digital products or services, and is 
popularly referred to as data exhaust. 

Big Data is decsribed by [3] as consisting of very large, distributed aggregations of loosely structured 
data, often incomplete and inaccessible with characteristics such as being in terabytes or petabytes of data, from 
billions/trillions of records, loosely-structured and often distributed data, flat schemas with few complex 
interrelationships, often involving time-stamped events, often made up of incomplete data, often including 
connections between data elements that must be probabilistically inferred, from millions/billions of people. [3] 
observed that the applications involved in Big-data can be transactional in nature, for example face book or 
analytic for example call center analytics. Big Data therefore refers to digital datasets of unprecedented size in 
relation to a particular question or phenomenon, and particularly datasets that can be linked, merged and 
analysed in combination. [4] suggested that it is more relevant to define big data as involving a process of 
analysis that characterises the data involved as big, rather than as a particular size of product. 

Big Data is typically defined in terms of 3 Vs, a designation originally developed by Gartner [5]: 
Volume, Velocity, and Variety. Volume refers to the exponential increase in the amount of data that are 
generated and stored. It is estimated that data production will be 44 times greater in 2020 than it was in 2009. In 
relation to volume; there can be a Big Data challenge when large amounts of data pose challenges to processing 
with traditional computing or techniques. Velocity refers to the speed at which data is processed. The growth of 
integrated sensors in all types of devices, and the increasing adoption rates of mobile phones worldwide 
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contribute to the continuous influx of data. Big Data projects seek to use these data to enable decision making in 
real time, which means there can be a Big Data challenge when the rate of data is moving too quickly to process 
with traditional computing or techniques. Variety refers to the complexity of the data and describes the different 
formats data can adopt, such as images, free text, video, and sound, among others. Big Data tries to harness 
existing data even if they are not structured or they have non-standard formats implying there can be a Big Data 
challenge when the data includes complex problems such as high dimensionality, data from many sources, or 
data having many different data structures: all of these problems can cause difficulty in processing with 
traditional computing or techniques [6]; [7]. Big Data has also been defined to consist of 5Vs that adds Veracity 
and Value to the already existing 3Vs of Volume, Velocity, and Variety [8] . Veracity refers to the confidence 
level associated with certain types of data accounts for the correctness of the data, and can include data quality 
problems such as noise or missing values. Value accounts for Big Data in the sense that if particular data does 
not provide significance or value, it is not relevant for Big Data analysis. See figure 1 . NIST defines big data as 
data of which the data volume, acquisition speed, or data representation limits the capacity of using traditional 
relational methods to conduct effective analysis or the data which may be effectively processed with important 
horizontal zoom technologies [9]. This paper adopts the NIST definition. 

The science of Big Data deals with not just volumes and velocity of data, but also deals with heterogeneity i.e. 
levels of granularity, media formats, scientific disciplines involved, and the complexity [10]. The heterogeneity 
of data is an important issue in big data since data is large in volume and produced at a high speed, and as such 
it is necessary to address heterogeneity related challenges by modelling and integration of big data [1 1]. 
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Fig. 1: The real world use of Big Data, source: [12] 

This study analyzes ways in which the Innovations Diffusion Theory [13] represents an attractive framework for 
adoption and utilization of big data in developing countries. According to the [15] developing countries are 
those with low income and lower middle income in the World Bank categorization of countries. The diffusion of 
Big Data is associated with and facilitated by measures taken by the providers of prescriptive planting 
technology to strengthen their resources and capabilities [16]. The Diffusion of Innovations Theory has a 
capacity to provide a general framework for Big Data researchers and users since it is predictive and has 
potential for guiding Big Data projects adoption and utilization in developing countries. This view is supported 
by [17];[18] who held the opinion that the Diffusion concepts can be operationalised in projects to affect the rate 
of adoption of innovations by slowing spread or, more commonly, by accelerating it. 

II. Background 

The advancement in computing and data science in the recent past has made it possible to process and 
analyze Big Data in real time. This data revolution initially restricted to the industrialized world is now being 
experienced in developing countries [2], The spread of mobile phone technology to the hands of billions of 
individuals for example, may be the single most significant innovation that has affected developing countries in 
the past decade. Mobile technology is used as a substitute for weak telecommunications, poor transport 
infrastructure, and underdeveloped financial and banking systems. In Kenya for instance, the greatest potential 
regarding data capture is from mobile phones since they are used daily to transfer money, buy and sell goods. 
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and communicate information including examination results, stock levels and prices of commodities [2];[19]. In 
many developing countries however, the volume of data, the velocity with which the data is generated and the 
variety and lack of structure often hinder their use. This creates the need to change the way information is 
captured, stored, processed, and analyzed, leading to the paradigm shift called Big Data [12], [16] observed that 
there is emergence of firms in these countries providing products, services, software, and solutions related to 
Big Data, an example being a technology developed by a Brazilian company Cignifi which can recognize 
patterns in consumers’ phone-calls, text messages, and data usage, and used to predict lifestyle and credit risk 
profile [20], 

[21] noted that Big Data is characterized by being generated continuously, seeking to be exhaustive and fine- 
grained in scope, and flexible and scalable in its production. Examples of the production of such data include: 
digital CCTV; the recording of retail purchases; digital devices that record and communicate the history of their 
own use e.g. mobile phones; the logging of transactions and interactions across digital networks e.g. email or 
online banking; click stream data that records navigation through a website or application; measurements from 
sensors embedded into objects or environments; the scanning of machine-readable objects such as travel passes 
or barcodes; and social media postings. These sources provide a good input of data for Big Data projects. 
According to [4] this data which is of interest for big data projects can be classified into three main classes of 
origin: 1) explicitly provided data e.g. social media postings, digital survey responses or volunteered 
geographical information for open mapping; 2) observed data, such as transactions taking place online or mobile 
phone call and location records; 3) Data inferred and derived by algorithms i.e. include people’s social network 
structure, trends relating to behaviour or transactions, and economic data such as inflation trends. In recognition 
of the potential of Big Data, the UN has called for a data revolution to underpin new development goals so that 
sustainable development practitioners can better track advances, integrate evidence into decision- making and 
provide more transparency [22], According to [22], the explosion of big data has far-outpaced our ability to 
make sense of it in all countries, but most of all in poorer nations that already lack human and technical 
capacity. 


A recent UN Economic and Social Council report looking at 107 national statistical offices showed that 
they see big data projects as a complement to traditional collection methods such as surveys (United Nations 
:Economic and Social Council, 2013). The report showed that more than half of the world’s states have plans to 
explore new uses for administrative data such as tax, customs and social security records, as well as from 
sources such as social media, internet searches and Global Positioning System. For example social media is 
already used by the Ghanaian and Mexican governments to track public perception and credibility of their 
administrations [22], Attempts on Big data projects have been made in Kenya, one example is the Nairobi geo- 
coded cell phone transaction data being used by the Engineering Social Systems (ESS) project to model slums’ 
growth, with the objective of helping the government to optimize resource allocation for infrastructural 
development and other resources [23]. In 2011, Kenya launched an Open Data Portal (ODP) with the help of the 
World Bank. The project received support at the highest levels of the government. The data in the ODP includes 
a full digital edition of the 2009 census, government expenditure for 12 years, household income surveys, and 
data about the location of schools and health facilities [16]. 

In many less developed countries, the inability to effectively adopt 21 st Century digital innovations 
hampers progress toward modernization and lowers the chances of them, as nations or collections of individuals, 
can participate and compete in the global, knowledge driven economy. Therefore in order to effectively address 
the digital divide and increase sustainability of ICT projects such as Big Data projects in less developed 
countries, it is imperative that scholars, policy makers, and practitioners understand research initiatives on 
diffusion of Information technologies from the perspectives of various stakeholders [24], The UN survey found 
that governments in the developing world have a number of problems when it comes to implementing big data 
projects. Concerns include legal questions about privacy and access to data, lack of human capacity, and scaling 
IT infrastructure to cope with the demands of large data sets. An example is the statistical office in Kenya, 
despite being one of Africa’s leaders in information technology lacks the expertise to train staff to use big data, 
and also awareness of the technology needed for analysis [25]. Lack of financial resources is also a major 
concern in developing countries, as the proportion of overseas aid dedicated to statistical programmes was 
slashed in half between 2011 and 2012, to 0.16 per cent, according to a 2013 report from the Partnership in 
Statistics for Development in the 21st Century -PARIS21. [26] 
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The goals of big data are to perform efficient analytics and to derive new information from the base 
data. Big data analytics is a type of quantitative research that examines large amounts of data to uncover hidden 
patterns, unknown correlations and other useful information [2], Data Analytics refers to the discovery of useful 
patterns in data and their effective communication to the user, for instance through appropriate interactive 
visualization techniques. This process usually employs a mix of methodologies coming from different research 
communities, such as statistics, machine learning, data mining and visual analytics, and is aimed to a broad 
range of application tasks, including data summarization, classification & prediction, correlation analysis, etc. 
[ 10 ]. 

III. Diffusion of Big Data 

The Diffusion of Innovations Theory is useful in providing an account of how technological 
innovations such as Big Data move from the stage of invention to widespread use or not. A review of literature 
on the Diffusion of Innovations Theory developed by [13] indicates that it can serve as a theoretical guideline 
for studying factors shaping the adoption and utilisation of big data in developing countries. According to [13] 
an innovation is an idea, practice, or object that is perceived as new by an individual or another unit of adoption, 
and diffusion is the process by which an innovation is communicated through certain channels over time among 
the members of a social system. [27] Describes adoption as the decisions that individuals make each time that 
they consider taking up an innovation. Similarly, [14] defines adoption as the decision of an individual to make 
use of an innovation as the best course of action available. The Innovations of Diffusion Theory is very 
comprehensive; and its concepts are very relevant to technology adoption in developing countries [24]; [28]; 
[29] . The Diffusion of Innovations is characterized by four elements namely an innovation, communication 
channels, time, and a social system [13]. 

3.1 Innovation 

[14] described an innovation as an idea, practice, or project that is perceived as new by an individual or 
other unit of adoption. This means that an innovation may have been invented a long time ago, but if individuals 
perceive it as new, it may be an innovation to them. Big Data uptake in Kenya for instance and many other 
developing countries are new practices and such projects objectives are to complement surveys and traditional 
databases. Innovation of Big Data from the perspective of developing countries can be understood as all the 
scientific, technological, organizational, financial, and commercial activities necessary to create, implement, and 
market new or improved Big Data products or processes [30]. 

3.2 Communication channels 

Communication is a process in which participants create and share information with one another in 
order to reach a mutual understanding [14]. According to [14], communication channels are the means by 
which a message is transmitted from one person to another. Mass media and interpersonal communication are 
two of such communication channels. Mass media channels include mass mediums such as TV, radio, and 
newspaper while interpersonal channels consist of a two-way communication between two or more individuals. 
In general, mass media are considered the best channels to create awareness about innovations, whereas 
interpersonal channels are crucial for persuasion and adoption of final decision. The diffusion of innovation 
model suggests people change perceptions of their value of an innovation through communication with others 
and their perceptions drive the implementation [31]. The innovation-decision process is an over-time sequence 
that involves five steps [32], These are: (i) knowledge (ii) persuasion (iii) decision (iv) implementation and (v) 
confirmation. In other words potential adopters of a technology progress over time through five stages in the 
diffusion process. First, they must learn about the innovation; second, they must be persuaded of the value of the 
innovation; they then must decide to adopt it; the innovation must then be implemented; and finally, the decision 
must be re-affirmed or rejected. 

3.3 Time 

Diffusion of innovations theory describes the social process of communication of a new idea among the 
members of a community over time. Therefore the innovation-diffusion process, adopter categorization, and rate 
of adoptions all include a time dimension [14], The time aspect is instrumental in technical aspects of 
applications development, experimentation and training of Big Data projects beyond the innovators and early 
adopters. Successful peer users of Big Data are needed to lead its adoption in the developing countries. 

3.4 Social System 

[14] Visualized diffusion as a process of communication during which information flows from one person to 
another, from a social group to another, or from one locale to another. Consequently when trend setters in a 
social group begin to display or model new technologies such as Big Data to others, they alter the perception of 
what is normative, as a result the others will subsequently begin to adopt such new technologies. 
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[14] Posited that the innovation-diffusion process is an uncertainty reduction process in which the attributes of 
innovations help to decrease uncertainty about the innovation. The perceived attributes of innovations are 
relative advantage, compatibility, complexity, and observability. 

(i) Relative advantage indicates the perceived costs and benefits involved in the adoption of an innovation, in 
terms of economic returns, immediacy of reward, social prestige, savings in time and effort [13]. The relative 
advantage of Big Data is indicated by the increasingly important role it plays in key development areas such as 
healthcare, agriculture, biotechnology, education, and environment monitoring [16]. Numerous projects have 
focused on the growing importance of big data in humanitarian response and in large institutionally driven 
development projects [4], Big data can offer a new depth of detail on particular issues because it makes it 
possible to show the granular detail of a given problem, and it can also be used to create interactive tools which 
engage the readers and lead them to seek to understand a problem better. The increased use of digital 
communications such as mobile phones and internet connections for example, is creating opportunities for using 
Big Data in developing countries. Another example is in activism where advocates are gaining the ability to 
aggregate data submitted by individuals more effectively than before, and present it in new ways that can 
motivate people to action using social media or dedicated platforms such as ushahidi [33]. 

(ii) Compatibility is the extent to which an innovation is perceived to match the needs, capacity, values, and 
surrounding social norms of potential adopters [14], In connection to compatibility, the age of data is upon us, 
and the means of its generation are undoubtedly multiplying, the technologies with which to analyze it are 
maturing, and efforts to apply such technologies to address social problems are emerging [34], 

(iii) Complexity refers to the degree to which an innovation is perceived as relatively difficult to understand and 
use [14], Excessive complexity of an innovation is an important obstacle in its adoption. Due to its huge size and 
often complex and unstructured nature, Big Data presents several analytical challenges that demand continually 
updated tools and expertise. Therefore analyzing Big Data poses different challenges that are in part 
methodological, or related to interpretation accuracy, methods of analysis, and detection of anomalies [2], [35] 
stated that Big data are blossoming and a result there is hope they can be used to harness the knowledge they 
hide to solve the key problems of society, business and science, but turning an ocean of messy data into 
knowledge and wisdom is an extremely difficult task. Legitimate concerns about privacy and the digital divide 
also present new obstacles to harnessing Big Data sets for public benefit [2], 

(iv) Trialability is the degree to which an innovation may be experimented with on limited basis, before making 
and adoption decision [14], According to the [2], Big Data analytics cannot be taken as a panacea for age old 
development challenges, nor does real-time information replace the quantitative statistical evidence 
governments traditionally use for decision-making. However, Big Data does have the potential to inform 
whether further targeted investigation is necessary, or prompt immediate response. Big Data has been tried in 
kenya, with the [2] observing that sources of Big Data for purposes of development, are those which can be 
analyzed to gain insight into to human well-being and development, and generally share the following features 
1) Digitally generated-data is created digitally, not digitized manually and can be manipulated by computers 2) 
Passively produced-Data is a by product of interactions with digital services Automatically collected: a system 
is in place that automatically extracts and stores the relevant data that is generated 3) Geographically or 
temporarily track able-This is the case of mobile phone location data and call duration time 4) Continuously 
analysed- Information is relevant to human well being and development and can be analysed in real time. 

(v) Observability is the degree to which the results of an innovation are visible to others [14], Observability is 
related to the visibility of results of an innovation resulting from using a new system in organization as well as 
the tangibility of the results of using new technology [36]. Kenya for instance launched open data portal [37] 
featuring information from the government census, as well as economic, health and education data. Other 
applications allow users to select a project in a particular parliamentary constituency and track its budget and 
spending. In sum, issues on big data are often covered in public media, scientific journals and conferences to 
discuss the adoption strategies, challenges and impacts of big data. 

According to [14], the rate of adoption of an innovation, measured by the number of adopters, is influenced by 
perceived attributes of innovation, type of innovation-decision, communication channels, nature of the social 
system, and extent of the change agents’ promotion efforts. This view is supported by [38], affirming that these 
factors are associated with information flow and communication in a social system. Thus, the diffusion of 
innovations concepts provide a useful framework for analyzing the diffusion of Big Data and to predict the 
future of these technologies in developing countries. The relevant points in applying diffusion theory to Big 
Data that must be understood are (1) the perceived attributes of innovations and how the communities in these 
countries perceive Big Data and Analytics (2) How Big Data and Analytics innovations are communicated and 
shared and (3) the consequences of Big Data adoption in terms of costs, benefits and socioeconomic impact on 
communities. 
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IV. Big data Adoption issues 

There are important issues specifically relevant to adoption of Big Data that must be acknowledged. 
Most developing countries face a number of challenges that may limit their capability to utilize Big Data 
effectively. A low degree of digitization is among the biggest barriers [16]. In many developing countries, there 
are limited hardware, software, and other technology applications to generate and distribute relevant data and 
knowledge [39]. Accurate and actionable data require considerable technical skills to handle data mining and 
analysis methods and systems. [16] observed that lack of human resources and expertise in developing countries 
represents a major barrier to the implementation of Big Data projects. Privacy is another important concern for 
those wishing to explore Big Data, since it has implications for all areas of work, from data acquisition and 
storage to retention, use and presentation [2], In adoption of Big Data, data should be managed as a strategic 
asset within organizations, but this has been hampered by management challenges, for example poor policies 
and directives about the management and protection of the data. There are also barriers to the adoption of Big 
Data which are usually cultural, for instance many organizations do not implement Big Data programs because 
they cannot appreciate the way in which data analysis can enhance their businesses [40]. According to the [2], 
although much of the publicly available online data has potential utility for development purposes, private sector 
corporations hold a great deal more data that is valuable for development. Some of these private sector 
corporations may be reluctant to share data due to concerns about competitiveness and their customers’ privacy. 
It is therefore recommended that it is critical to ensure a legal framework that defines rules for privacy- 
preserving analysis and protect the competitiveness of the private sector companies willing to share data. 
Aggregating data belonging to companies operating in a similar sector in a data common cohorts may prevent 
the attribution of a certain data set to a specific company for example, and can preserve privacy. 

V. Lessons learnt 

The number of big data projects and people using big data are increasing in developing countries 
indicating a diffusion of big data technologies [2], Diffusion of innovations can greatly accelerate adoption and 
utilization of Big Data, even though there are challenges faced by developing countries which limit capability 
and utilization of these technologies effectively. The Big Data adoption issues highlighted imply there is need 
for strategies for promoting technology diffusion including information dissemination, development of legal 
frameworks to deal with matters such as privacy concerns, cultural change, increasing digitization levels and 
investment in human capital particularly technical skills development. 

VI. Conclusion 

The diffusion theory concepts provide an effective mechanism for policy leaders in developing 
countries to maximize adoption of Big Data innovations, and at the same time can also be useful in informing 
policy implementers on how to increase adoption rates. In less developed countries where there are limited 
hardware, software, and other technology applications to generate and distribute Big data and knowledge, few 
technical experts to handle data mining and analysis, big data projects are few and technological infrastructure is 
limited, using a tested theory such as the diffusion of innovations may be a vital component to successful 
adoption and utilization of Big Data and Analytics. 
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